Techniques for improved heavy particle searches with jet substructure 
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We present a generic method for improving the effectiveness of heavy particle searches in hadronic 
channels at the Large Hadron Collider. By selectively removing, or pruning, protojets from the 
substructure provided by a kx-type jet algorithm, we improve the mass resolution for heavy decays 
and decrease the QCD background. We show that the protojets removed are typical of soft radiation 
and underlying event contributions, and atypical of accurately reconstructed heavy particles. 
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The Large Hadron Collider (LHC) presents at once 
great opportunity and great challenge. Many scenarios 
for new physics involve heavy particles that decay, possi- 
bly through a cascade, to Standard Model (SM) hght 
quarks and gluons. The resulting final states consist 
partly, or even entirely, of jets. If the new particles are 
not too heavy, they may often be produced with sufficient 
boost to appear in a single jet. Thus, in the search for 
new physics at the LHC, identifying those jets that con- 
tain the decay of a heavy particle may be an important 
tool. The key difficulty will be separating this signal from 
the SM background, namely QCD jets. Recently, several 
groups have suggested novel and effective techniques for 
separating hadronic decays of heavy particles from QCD 
making use of the expected differences in the internal 
structure of the jets [D H El H [3 El [7] . The procedures 
proposed tend to be "top-down" in the sense that they 
are tuned to specific properties of, say, the two-pronged 
decay of a Higgs boson, or the three-pronged decay of a 
top quark. Here we present a related approach, based, of 
course, on the same underlying differences between real 
decays and QCD, but of a simpler nature and intended 
for use in general searches for new (a priori unknown) 
heavy particles. 

While historically the masses of jets have played lit- 
tle role in the analysis of collider data, this is likely to 
change at the LHC [5]. The simplest way to search for 
heavy particle decays into single jets is to look for fea- 
tures ( "bumps" ) in the jet mass distribution for an ob- 
served jet sample. Since QCD lacks any intrinsic scale 
beyond Aqcd, the background will be featureless aside 
from statistical fluctuations. Further, if the heavy par- 
ticle decay includes a chain of new heavy particles, it is 
natural to ask whether we can look for evidence of these 
other mass scales in the substructure of the jet. Consider, 
for example, searching for a top quark in a single jet (as 
in [H |51 [S]). (We will use the top quark as a surrogate 
for new particle searches in the studies outlined below.) 
We would not only expect to see an enhancement for 
jet masses near the top quark mass, but we would expect 
correlated evidence of the W boson mass in the substruc- 
ture of the jet. If we are using a recombination algorithm 
such as the kx algorithm, the natural choice is to iden- 



tify the W with one of the protojets involved in the final 
merging. 

Our aim in this paper is to present a procedure that im- 
proves the effectiveness of this type of search. Our tech- 
nique suppresses systematic effects of the jet algorithm, 
as well as generic features of hadron collider events, 
such as the underlying event. Both effects tend to ob- 
scure the mass scales present in a heavy particle decay 
as observed in a single jet. Our technique narrows the 
structure in the jet and protojet mass distributions for 
jets from heavy particle decays, and reduces the smooth 
background QCD jet mass distribution. The result is 
a substantially increased likelihood of identifying a new 
physics (heavy particle) signal in the measured jet and 
protojet mass distributions. 

Jet algorithms are designed to interpret long-distance 
degrees of freedom observed in the detector in terms of 
short-distance degrees of freedom. The algorithms take 
a set of initial protojets, such as calorimeter towers, and 
group them into jets. Recombination algorithms are a 
special class of jet algorithms that specify a prescription 
to pairwise combine protojets in an iterative procedure, 
eventually yielding jets. This prescription is based on the 
dominant soft and coUinear physics in the QCD shower, 
so that the algorithm can trace back to objects coming 
from the hard scattering. The pairwise merging scheme 
of recombination algorithms naturally gives substructure 
to a jet, which provides kinematic handles to determine 
whether the jet was produced by QCD alone or a heavy 
particle decay plus QCD. 

A general recombination algorithm uses a distance 
measure pij between protojets to control how they are 
merged. A beam distance pi determines when a protojet 
should be promoted to a jet. The algorithm iteratively 
finds the smallest of the pij and the pi. If the smallest is 
a Pij, protojets i and j are merged into a new protojet. 
Otherwise, the protojet corresponding to the smallest pi 
is promoted to a jet. The algorithm terminates when no 
protojets remain. 

For the kx fS] and Cambridge-Aachen (CA) [10 algo- 
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rithms, the metrics are 

Ai? 

kx ■■ Ptj= niin(pT, , Pt, ) —f^ , Pi = PTi ; 
CA:p„ = — Pi = l; 

where is the transverse momentum of protojet i and 
Ai?y = y/ {(pi — + {yi — yj)2 is a measure of the an- 
gle between two protojets, where (j) is the azimuthal an- 
gle around the beam direction and y is the rapidity along 
the beam direction. The angular parameter D governs 
when protojets should be promoted to jets: it determines 
when a protojet 's beam distance is less than the distance 
to other objects. The substructure arising from this pair- 
wise merging procedure is straightforward. 

In considering the kinematics of the substructure, two 
variables, z and 9, are particularly useful. For a recom- 
bination 1 , 2 — > p, we define 

z = mm{pT,,PT2)/PTp, 9 = ARi2. (2) 

To identify heavy particle decays reconstructed in a sin- 
gle jet, we are concerned with recombinations that occur 
at large 9, typically the final recombination. In general, 
small-0 recombinations are likely to represent the QCD 
showering of the decay products. Similarly, small-z, or 
soft, recombinations are typical for a QCD shower. Even 
the large-angle, but small-z, recombinations that can ap- 
pear in jets from a heavy particle decay will be unlikely 
to yield an accurate representation of the decay: if a 
heavy particle decays such that one decay product has a 
much lower pT relative to the others, the parent particle 
is unlikely to be accurately reconstructed. So, while the 
variable z can be an effective discriminator between QCD 
and decays in principle, the substructure found by the jet 
algorithm often does not faithfully represent the differ- 
ing dynamics. Soft radiation, as well as soft contributions 
from the underlying event and pileup, will be present in 
all jets. These contributions to the jet lead to broadened 
mass distributions, especially for kx jets. In addition, 
due to the systematic effects of the jet algorithm, these 
soft contributions can often appear in the final recombi- 
nation. This is particularly true for CA jets, because CA 
orders strictly by 9. The large number of soft protojets 
ensures that frequently one will appear at a large angle 
in the final recombination. 

We now define a procedure that systematically removes 
these undesirable soft, large angle recombinations. The 
procedure operates by rerunning the algorithm and veto- 
ing on these recombinations, i.e., removing, or pruning, 
them from the substructure of the jet. It is algorith- 
mically similar to others [31 |S], which also modify the 
jet substructure to improve heavy particle identification. 
The key distinction is that pruning is applied to an en- 
tire jet from the bottom up, with no goal of finding a 
particular number of "subjets" . The pruning procedure 
is: 



1 . Rerun the jet algorithm on the set of initial proto- 
jets from the original jet, checking for the following 
condition in each recombination 1,2 p: 

z < Zcut and Ai?i2 > -Dcut- (3) 

2. If this condition is met, do not merge the two pro- 
tojets 1 and 2 into p. Instead, discard the softer 
protojet and proceed with the algorithm. The re- 
sulting jet is the pruned jet. 

The pruning procedure involves two parameters, Zcut 
and Dcuti which determine how small z must be and 
the minimum angle AR of the recombination for it to 
be pruned. In this study we use Dcut = ^j/pt,, for 
both kx and CA, where mj is the mass of the origi- 
nally identified jet and ptj is its transverse momentum. 
This choice is both adaptive to the properties of the in- 
dividual jet and IR safe. Pruning with a smaller Dcut 
degrades the mass resolution by significantly pruning the 
QCD shower of daughter partons of the heavy particle 
decay, and pruning with a larger Dcut does not take full 
advantage of the procedure. For the CA algorithm, we 
use Zcut = 0.10. Because the kx algorithm orders recom- 
binations partly in z, very small- z recombinations are 
not expected at the end of the algorithm. This implies a 
more aggressive pruning procedure is needed for the kx 
algorithm, so in this study we use Zcut = 0.15 for the kx 
algorithm. We find that these values of the pruning pa- 
rameters yield roughly optimal results, largely insensitive 
to small changes in their values [TT] . 

We examine the effects of the pruning procedure in a 
study of top quark reconstruction and separation from 
the QCD background. The top quark serves as a sur- 
rogate for a heavy particle decay at the LHC, and lets 
us learn about the effects of pruning in identifying heavy 
particles. 

We generate events using MadGraph/MadEvent 
v4.4.21 [121 interfaced with Pythia v6.4 [13] for show- 
ering and hadronization. For the QCD background, we 
produce a matched sample of 2, 3, and 4 hard partons 
(gluons and the four lightest quarks) using MLM-style 
matching implemented in MadGraph (see, e.g., [2]). We 
use the DWT tune [15 in Pythia to give a "noisy" un- 
derlying event. No detector simulation is performed so 
we can isolate the "best case" effects of our method. 

The signal sample is tt production with fully hadronic 
decays. We generate signal and background samples with 
a parton-level hj- cut for generation efficiency, where hj- 
is the scalar sum of all pr in the event. Because we focus 
on single jet methods to identify heavy particles, we make 
samples defined by criteria on jets instead of events. For 
each sample, we select central jets (with pseudorapidity 
|?7| < 2.5) and divide them into four px bins: [200, 500], 
[500, 700], [700, 900], and [900, 1100] (all in GeV/c). 
These bins confine the top quark boost to a narrow range 
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within each bin and allow us to study the performance 
of pruning as the top quark px varies. For each pT bin 
[pmin^pmax]^ ^j^g parton-levcl hr cut is p5?'"-25 GeV/c < 
/it/2 < + 100 GeV/c. We take the matching scales 
(g^f,Qmatch) to be (20, 30) GeV for the lowest pr bin 
and (50, 70) GeV in the other three bins. 

From the hadron-level output of Pythia, we group 
final-state particles into "cells" based on the segmen- 
tation of the ATLAS hadronic calorimeter (Ary — 0.1, 
A(j) = 0.1 in the central region). We sum the four- 
momenta of all particles in each cell and rescale the re- 
sulting three-momentum to make the cell massless. After 
a threshold cut on the cell energy of 1 GeV, cells become 
the inputs to the jet algorithm. Our implementation of 
recombination algorithms uses Fast Jet [TB] . 

To quantify the effects of pruning in top identification 
and background separation, we define criteria for a jet 
to be labeled as reconstructing a top quark decay. For 
either the pruned or unpruned jet, a top jet is one whose 
mass is within the top mass window and whose heavier 
daughter protojet mass is within the W mass window. 
Both windows come from fits to the mass distributions in 
the signal sample, and do not need to be known a priori. 
These are fit using a skewed Breit-Wigner distribution for 
the peak and a power-law continuum for the background. 
These functions are 
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peak: /(m) = M^T 



continuum: g(m) = h 



[a + b{Tn - M)] 
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M2)2 + M2r2' 



(4) 



The fitted mass M, which is within a few GeV/c2 of 
mtop, and the fitted width F are the relevant parameters; 
the mass window is M ± F. These mass windows are in 
general different for the pruned and unpruned samples. 
In Fig. [T] we plot the top and W window widths for the 
kx and CA algorithms for both pruned and unpruned 
jets. We refer to the pruned version of algorithm A as 
pA. 
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FIG. 1: Pruned and unpruned top (a) and W (b) mass win- 
dow widths (in GeV/c^) versus pt window center (in GeV/c) 
for both kx and CA algorithms. 

The top and W mass windows are significantly nar- 
rower for the pruned samples. Moreover, the widths for 



the pruned kx and CA algorithms are very similar, un- 
like the unpruned case. The narrower widths mean fewer 
jets from the QCD samples will be misidentified as tops. 

We now discuss a more quantitative measure of the 
performance of pruning. From the found mass windows 
we count the number of top jets in the signal and back- 
ground samples, iVs(A) and N-a{A), for algorithm A. Us- 
ing these counters, we define a statistical measure, S, 
to quantify how pruning improves top identification and 
separation from QCD backgrounds. S is defined as 



5 = 



N,{pA)l./lUp^ 
N,{A)/^NjA) ' 



(5) 



which is the improvement from pruning in the ratio of 
the signal size to the statistical fluctuations in the back- 
ground, and is a measure of the expected improvement 
in significance of the signal. Values greater than one in- 
dicate an improvement in pruning versus not pruning. 
Note that while the significance of the signal depends on 
the relevant cross sections and the integrated luminosity, 
the improvement measure S does not. Using a constant 
value oi D ~ 1.0 for all pt bins, we plot S in Fig. [2] for 
both the kx and CA algorithms. 
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FIG. 2: S vs. Pt for the CA and kx algorithms, using D = 
1.0. Statistical errors, due to limited QCD sample sizes after 
cuts, are shown. 

For both algorithms, the measure S is in the range 1.2 
to 1.4 in the lowest pr bin, and increases with increas- 
ing Pt, with a dramatically increased significance in the 
range of 3 to 6 in the highest bin. These large values of S 
arise partially from using a fixed value of D with varying 
Pt- The opening angle of the typical top quark decay 
varies as Ai? w 2mtop/pT) which is less than = 1.0 
in the larger pT bins. The large D allows for extra ra- 
diation to be merged in the jet, which may sufficiently 
alter the order of the substructure reconstruction to ren- 
der an actual top decay no longer identifiable as a top 
jet. Additionally, a larger D at fixed pt leads to larger 
mass QCD jets and enhances the probability to fake top 
quarks. In both scenarios the extra radiation included 
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within the larger D jet is often soft and uncorrelated. 
Hence pruning tends to dramatically improve top finding 
at large pr in fixed D jets. 

In a real search, the mass of the heavy state is not 
known. Once an enhancement in the mass distribution 
has been observed, knowledge of the purported mass can 
be used to tune the analysis parameters, such as D. (An- 
other approach, using "variable-R" jets, is discussed in 
[T7].) Even if D is tuned for each pt bin to maximize 
the performance of the unpruned algorithm, we would 
still expect pruning to show an improvement over the 
unpruned case. This can be seen in the lowest bin of 
Fig. |2] where the value of Z? = 1.0 is already roughly 
optimal and S is still larger than 1. 

Given that pruning always provides an improvement, 
the relevant question for designing a search procedure 
using single jets is whether pruned, tuned-D jets provide 
much better results than pruned, fixed-D jets. To answer 
this question, we compare signal-to-noise for pruned jets 
with fixed D = 1.0 to the case where D is picked for 
each Pt bin to match the typical opening angle of the top 
quark decay. In particular, we set D to be approximately 
2TOtop/p™'", where p^™ is the lower px limit for the given 
bin, up to a maximum of 1.0. Thus we choose the D 
values of {1.0, 0.7, 0.5, 0.4} for our px bins. This exercise 
leads to Fig. [3] where we show a ratio analogous to S 
that we call So- For each px bin, Sd is the ratio of 
signal-to-noise for pruned jets with the value of D from 
the above list to signal-to-noise for pruned jets with fixed 
D — 1.0. We see that the values of Sd are close to one 
for all Pt bins. This implies the important result that, 
as long as we prune the jets, using a tuned D value for 
each Pt bin provides little advantage over the simpler 
fixed D analysis. Note also that in Fig. |3] the statistical 
uncertainties in Sd are on the order of the improvements. 



This procedure, pruning, removes recombinations un- 
likely to represent an accurately reconstructed heavy par- 
ticle, narrows mass distributions of reconstructed states 
and reduces the QCD background in a given mass bin. 
As we have demonstrated, heavy particle searches can 
benefit from all of these effects. While unpruned jets are 
sensitive to the specific choice of jet algorithm and the 
value of the parameter D, pruning removes much of this 
sensitivity. It is just as effective to use a large D over a 
broad range in m/pT of the heavy state. When searching 
for a particle of unknown mass, pruning allows the use 
of a large fixed D without losing statistical power. 

The effects of pruning, and in general the application 
of jet substructure to find heavy particles, requires fur- 
ther study Pruning must be verified as an effec- 
tive component of heavy particle searches at the LHC, 
including understanding the impact of using a realistic 
detector. An important test bed for pruning and other 
jet substructure tools will be early validation studies of 
the Standard Model at the LHC, where we expect to be 
able to observe top quarks, W's and Z's in the single 
jet data. Initial studies such as that described here give 
promising indications that these tools will prove useful 
in the search for new physics. 
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FIG. 3: Sd vs. pr for the CA and algorithms. The line at 
Sn = 1 separates the regions where a tuned D helps (above 
the line) and does not (below). The lowest pr bin is not shown 
because the D value does not change {Sd = !)• Statistical 
errors are shown. 

In this work, we have introduced a generic procedure 
that modifies jet substructure to improve heavy parti- 
cle identification and separation from QCD backgrounds. 
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